MBON Acoustic Indices Study
Biostats Review — Methods & Preliminary Results
2025-12-12
Research Question
Can acoustic indices predict biological community metrics in estuarine environments?
- Location: 3 stations, May River, South Carolina
- Period: 2021 (full year)
- Responses: 9 community metrics
- Fish: activity, richness, presence
- Dolphins: echolocation, burst pulse, whistle, total activity, presence
- Vessels: presence
- Predictors: ~60 acoustic indices (candidates)
Data Overview
- 13,102 observations (2-hour temporal bins)
- 4 data sources aligned to common resolution:
| Detections |
Manual annotations of fish/dolphin/vessel presence |
| Environment |
Temperature, depth (sensor data) |
| Acoustic Indices |
~60 indices across 5 categories |
| SPL |
Sound pressure levels |
- Index categories: Amplitude, Complexity, Diversity, Spectral, Temporal
- Temporal structure: station / month / day / hour
Pipeline Overview
Stage 00: Data Alignment (4 sources → 2-hour bins)
↓
Stage 01: Index Reduction (60 → 14 via correlation/VIF)
↓
Stage 02-03: Response Variables + Feature Engineering
↓
Stage 05: GLMM vs GAMM Modeling
Why both models?
- GLMM: Assumes linear predictor-response relationships
- GAMM: Allows non-linear (smooth) relationships
We fit both and compare — if relationships are truly linear, prefer GLMM (simpler); if non-linear, GAMM will fit better.
Model Specifications
GLMM (glmmTMB)
- Linear fixed effects for all predictors
- AR1 autocorrelation within days
- Random intercepts: station, month
REML = FALSE (for AIC comparison)
GAMM (mgcv::bam)
- Smooth terms (k=5) for indices & covariates
- Cyclic splines for hour, day-of-year
- Random effects: station, month
- AR1 via
rho parameter
Both models:
- Negative binomial (nbinom2) for count responses
- Binomial for presence/absence responses
- Selection rule: Prefer simpler model if ΔAIC < 4
Question 1: Index Reduction
Is our approach appropriate? Should we reduce further?
Index Reduction: What We Did
Step 1: Correlation pruning
- Removed one index from each pair with |r| > 0.6
- Result: 60 → 17 indices
Step 2: VIF screening
- Iteratively removed indices with VIF > 2
- Result: 17 → 14 indices
Outcome:
- 14 indices retained
- All 5 categories preserved (Amplitude, Complexity, Diversity, Spectral, Temporal)
Index Reduction: Concerns
- Is 14 indices too many?
- 14 predictors for 13K observations (~1:900 ratio)
- Some papers use stricter reduction
- Is correlation + VIF the right approach?
- Alternatives: PCA, LASSO, elastic net
- Would lose direct interpretability
- Model shrinkage removed 4 more
- GAMM
select=TRUE shrunk ADI, BioEnergy, EPS_KURT, MEANt to ~zero
- Should we formalize this as a two-stage approach?
Index Reduction: Questions for Discussion
Q1: Is correlation + VIF standard practice, or would you recommend a different approach?
Q2: Given that GAMM shrinkage removed 4 indices, should we adopt a two-stage approach (VIF → model-based selection)?
Q3: 14 predictors for 13K observations — is this ratio acceptable?
Question 2: Modeling Results
What are our results telling us? Any concerns?
Model Comparison: fish_activity
| GLMM |
34,903 |
3.0 min |
| GAMM |
28,833 |
0.8 sec |
| ΔAIC |
6,071 |
|
Interpretation:
- ΔAIC > 10 = strong preference
- ΔAIC = 6,071 is overwhelming
- GAMM selected
Additional evidence:
- GLMM diagnostics show systematic misfit (next slides)
- Non-linear relationships justify GAMM
GAMM Results: Significant Predictors
| hour_of_day |
8.24 |
<0.001 |
Strong diel pattern |
| ACI |
2.66 |
<0.001 |
Non-linear, positive |
| BI |
2.82 |
<0.001 |
Non-linear, negative |
| EAS |
3.08 |
<0.001 |
Non-linear |
| VARt |
2.94 |
<0.001 |
Non-linear |
| depth |
1.00 |
<0.001 |
Linear, negative |
Shrunk away (not significant): ADI, BioEnergy, EPS_KURT, MEANt
GAMM Smooth Plots: Overview
Smooth Zoom: hour_of_day
Observations:
- Strong diel pattern (EDF = 8.2)
- Peak activity ~8 PM (hour 20)
- Lowest ~10 AM (hour 10)
Validation:
- Matches known fish calling behavior
- Model is capturing real biology
Smooth Zoom: BI (negative relationship)
Observation:
- Higher BI → less fish activity
- Counterintuitive?
Possible explanations:
- BI elevated when other sources dominate (snapping shrimp?)
- Fish call when BI is lower
- Seasonal confounding?
Question: Ecologically interpretable or artifact?
Smooth Zoom: VARt (non-linear)
Observation:
- “Goldilocks” relationship
- Fish activity peaks at intermediate VARt
- Drops at both extremes
Implication:
- GLMM can’t capture this (needs a line)
- Justifies GAMM selection
Unexpected Results
Temperature:
- GLMM: highly significant (p ≈ 0)
- GAMM: NOT significant (p = 0.12)
Day of year:
- GAMM: NOT significant (p = 0.18)
- Despite visible seasonality in data →
Hypothesis: Acoustic indices absorb the seasonal/temperature signal?
Methodological Concerns
- AR1 comparability
- GLMM:
ar1(time_within_day + 0 | day_id)
- GAMM:
rho parameter
- Are these equivalent structures?
- AIC comparison validity
- GLMM uses ML (
REML = FALSE)
- GAMM uses fREML (
method = "fREML")
- Technically not directly comparable?
- 10 of 14 indices significant
- Genuine signal or overfitting?
- Depth sign flip
- GLMM: positive (+0.23)
- GAMM: negative
- Suggests GLMM mis-specified?
GLMM Diagnostics (DHARMa)
![]()
- KS test: p = 0 (significant deviation)
- Outlier test: p = 0.0005 (outliers detected)
- Systematic misfit supports GAMM selection
Summary: Questions for Discussion
Index reduction: Is correlation + VIF appropriate? Should we reduce further, or is model shrinkage sufficient?
Model choice: GAMM strongly preferred by AIC — any concerns? Is fREML vs ML comparison valid?
Autocorrelation: Is our AR1 handling adequate in both models?
Interpretation: Temperature/seasonality not significant in GAMM — absorbed by indices, or a problem?
Next steps: Expand to all 9 responses? Additional diagnostics? Anything we’re missing?
Additional Context
- Pilot mode: Results shown are for
fish_activity only — will expand to all 9 responses
- MEANt: No real variation in raw data (numerical noise ~10⁻¹⁹) — model correctly shrunk it away
- Repository: [link] — full specs, code, and data pipeline available
Thank you! Questions?